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Abstract: This paper describes developments in Welsh-language terminology within the education 
system in Wales. Following an outline of historical terminology work, it concentrates on the 
consolidation of terminology standardization at the Language Technologies Unit, Bangor University, 
with particular reference to two projects, one concerned with terminology for school-age and further 
education, the second concerned with higher education. The developments described include 
the adoption of international standards in terminology standardization and their incorporation in 
an online terminology standardization environment and dissemination platform that enable access 
to the centralized terminological dictionaries via a number of sophisticated websites, portals and 
mobile apps featuring rich dictionary entries. Some of the issues in managing large term collections 
are explored, and usage statistics are presented for the resources described. 
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1. Introduction 

1.1. The Case for Terminology Standardization in Wales 

Wales is one of the constituent countries of the United Kingdom. At the time of the most recent 
census (2011) it had a population of 3 million, with 562,016 (19%) having indicated that they were 
Welsh-speakers [1]. Its education system, for which the Welsh Government has devolved responsibility, 
provides education in the medium of both English and Welsh, up to and including university level. 
According to the latest figures, over one in five primary, middle and secondary school pupils in 
Wales are enrolled in Welsh-medium education [2] . As a result, Welsh-language versions of classroom 
materials, course specifications and examination papers are produced for a large proportion of the 
subjects taught up to secondary level. Most of these resources are produced by the examination 
board entitled the Welsh Joint Education Board (WJEC) and a number of independent suppliers based 
primarily in Wales, usually with financial assistance from the Welsh Government. In further and higher 
education, Welsh-medium provision is increasing due to investment from the Welsh Government and 
the establishment of a virtual Welsh-medium national college, the Coleg Cymraeg Cenedlaethol , which 
has branches in universities across Wales. 

Welsh-medium education therefore encompasses a broad variety of subject domains and age 
groups, and involves multiple stakeholders and participants, many of whom are involved in producing 
Welsh-language resources. Terminology standardization, that is, the development and adoption of 
technical terms by an authoritative body for a specific purpose, is thus an essential consideration as it 
is important that Welsh-medium students experience a continuity of terminology from course book 
to examination paper, from subject to subject, and from one educational stage to the next. A lack of 
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continuity would place Welsh-medium students at a disadvantage in comparison to those studying 
the same course through the medium of English, where the technical terms may be more established. 

Bilingual terminological dictionaries differ from traditional bilingual general-language 
dictionaries in that they do not attempt to list all the meanings and possible target-language equivalents 
which relate to a source-language word, leaving the user to decide which of the equivalents to use. 
Rather, terminological dictionaries endeavour to prescribe a single term as the preferred label for 
a specific concept within a certain domain or subject. Therefore, they are to be used in the context of 
language for special purposes, rather than for everyday language. 

1.2. Historical Terminology Work 

Awareness in the education sector of a need for Welsh terminological dictionaries dates back to the 
early twentieth century when efforts were being made to bring the language into the school curriculum. 
A report published in 1927, Welsh in Education and Life, revealed not only a general shortage of resources 
for teaching Welsh in schools and for teaching other subjects through the medium of the language 
at university, but also a lack of dictionaries and other terminological resources and the problems 
this created [3]. There was some way to go before the concept of standardizing terminology would 
arrive in Wales, however a year later a standardized orthography for Welsh, Orgraffyr Iaith Gymraeg, 
was published, stabilizing the development of the language and setting the scene for subsequent 
terminology work in education. By 1971 close to thirty lexicons were available for a range of subjects 
including biology, geography, history and physics [4] (pp. 29-35). These were produced by the WJEC 
and the University of Wales Press, the former concentrating on subjects taught in schools and the 
latter on university subjects, with some overlap between the two [5] (p. 44). These were followed in 
1973 by the first modern Welsh-language terminological dictionary, Geiriadur Terman, published by the 
University of Wales Press and reflecting "the effort of many people engaged in education in Wales to 
produce lists of terms required for the teaching of a number of school subjects through the medium 
of Welsh" [6] (p. vi). This marked a major step forward because it was bi-directional: the previous 
term lists were in one direction only, reflecting the priorities of the time, namely the need to translate 
material from English to Welsh [5] (pp. 44-45). It also included information such as grammatical 
gender, which had been lacking in some older term lists deemed "inadequate" by teachers due to 
this omission [6] (p. ix). The introduction to the volume mentions that developing terminology in 
education was also a problem for other languages not previously widely used in education, but it does 
not refer in more detail to this, nor does it refer in its methodology to processes used elsewhere or to 
the concept of standardization in terminology work [5] (p. 45); [6] (pp. vi-xii). 

In the years between the publication of Geiriadur Termau and the early 1990s, many other 
terminological dictionaries appeared, produced by various institutions involved in Welsh-medium 
education, such as the aforementioned University of Wales Press and the WJEC (for a comprehensive 
list see [7]). However, as many of these volumes were conceived and published independently 
within separate institutions, competing terminological dictionaries for the same domain occasionally 
appeared, with some having been in concurrent development [8] (p. ii). 

1.3. Consolidation of Terminology Standardization in Education 

In 1993, the School Curriculum and Assessment Authority (the body then responsible for the tasks 
and statutory tests which were used to assess pupils at ages 7,11 and 14 years), put out to tender the 
work of standardizing terminology for all key stages and all subjects in the National Curriculum [9] 
(p. v); [5] (p. 46). This new standardized curriculum, introduced in the wake of the Education Reform 
Act of 1988 [10], was to be taught in state-funded schools in England and Wales. The brief for the 
terminology project was to "establish objective criteria for standardizing Welsh terminology" and to 
"develop computer-based databases to manage and store the terminology data" [5] (p. 47). The tender 
was won by the School of Education at the University of Wales, Bangor (now Bangor University), 
which set up the Canolfan Safoni Termau, a new Centre for the Standardization of Terminology, in order 
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to carry out the work. This centre was later amalgamated into what is today known as the Language 
Technologies Unit (LTU), one of five units which make up Canolfan Bedwyr, a Welsh language centre 
at Bangor University. 

To fulfil the first part of the brief, the Canolfan Safoni Termau looked beyond Wales and decided 
to base its objective criteria for standardization on the work of the International Organization for 
Standardization (ISO) and specifically on three of its standards: ISO 704 Terminology Work—Principles 
and Methods [11], ISO 860 Terminology work—Harmonization of concepts and terms [12] and ISO/TR 12618 
Computational aids in terminology—creation and use of terminological databases and text corpora [13]; [5] 
(p. 47). The policy of adapting these standards to suit the needs of the Welsh language has been 
described as "crucial in mainstreaming Welsh terminology work and establishing standardization 
criteria in step with current international best practice" [14] (p. 194). 

The project culminated in the release in 1998 of a single volume combining the terms from 
all subjects then taught in the National Curriculum. The volume, Y Termiadur Ysgol [9], built on 
and standardized the work already undertaken in the many separate subject-specific terminological 
dictionaries and lists that had been released in the previous decades, with the data for the first time 
being stored in a database form, albeit on a local machine. This facilitated the creation of a searchable 
software version of the terminological dictionary which was included on CD with the printed volume. 
The software version featured sophisticated lemmatization functionality which allowed users to search 
using inflected forms and still find the appropriate entry. 

Y Termiadur Ysgol was followed in 2006 by an expanded edition, Y Termiadur [15], featuring 
thousands of additional terms and amendments to some of the terms that had failed to gain purchase 
in the language. It included, for the first time, terms for vocational and A Level courses. In addition 
to the print and software edition, Y Termiadur now appeared in website form as a fully searchable 
terminological dictionary with many of the advanced search facilities found in the software edition. 

Work on the third iteration of the Termiadur series, Y Termiadur Addysg [16], began in 2011, 
following the publication of the Welsh Government's Welsh-medium Education Strategy which 
explicitly included terminology standardization as one of its objectives [17] (p. 18). In each iteration 
of the dictionary, the title was changed. The first title, Y Termiadur Ysgol, translates as "the School 
Terminological Dictionary", the second Y Termiadur as simply "the Terminological Dictionary", and 
the third Y Termiadur Addysg as "the Education Terminology Dictionary". The reference to education 
was dropped in the second iteration of the dictionary due to its widespread use among translators 
and others outside the school system [18] (p. 161). The current iteration, still ongoing at the time 
of writing, continues in digital format only. Y Termiadur Addysg is available online and as part of 
the Cysgliad software package for Microsoft Windows. In 2012, it was made available within apps 
for Amazon, Android and Apple iOS. Funded by the Welsh Government, its remit has expanded 
to include terminology from further education courses and domains, in addition to those found in 
primary, middle and secondary education. 

The development of terminology for Welsh-medium higher education (HE) did not take place in 
parallel to that of terminology for schools in the 1990s as consistent funding for terminology in this 
sector only began in earnest in 2009. Prior to this, a few subject-specific HE terminological dictionaries 
were published and funded by different bodies. In 2004, a Dictionary of Terms for Psychology [19] was 
printed, followed by a Dictionary of Terms for Woodland Management in 2005 [20]. These were funded 
by Bangor University in conjunction with the Wales branch of the British Psychological Society and 
the Forestry Commission Wales, respectively. A new funding partner to Bangor University became 
involved in 2008 with the publication of a Dictionary of Legal Terms [21] and the online publication of 
a Dictionary of Terms for the Creative Industries [22]. This partner was the Centre for Welsh-medium 
Higher Education (CWMHE). For some years previously, there had been calls for the establishment 
of a "federal college" to better support and consolidate Welsh-medium provision in HE. Strategies 
and proposals for its creation coalesced into the founding of the CWMHE, and ultimately led to 
the establishment in 2011 of a virtual Welsh-medium federal college, the Coleg Cymraeg Cenedlaethol 


Educ. Sci. 2016, 6, 2 


4 of 15 


("Welsh National College"). The aim of the CWMHE and later the Coleg Cymraeg Cenedlaethol was 
to promote, develop and broaden Welsh-medium provision within Wales' existing universities by 
funding Welsh-medium PhDs and lectureship posts within them (in addition to those already funded 
by the universities themselves) as well as extending the range of subjects available to students in 
Welsh. In order to increase collaboration between different institutions involved in the development of 
Welsh-medium provision in specific domains, the CWMHE created "subject panels", whose members 
included lecturers from every Welsh university. Subject panels could apply to the CWMHE for 
grants to develop terminology, and the Dictionary of Terms for the Creative Industries was published 
with one such grant. In 2009 the CWMHE, predicting that the demand for terminology would only 
increase in future with growing numbers of Welsh-medium lecturers and students, determined that 
a more efficient and cost-effective way of fulfilling terminological needs in the HE sector would be 
to fund a single long-term terminology project with a dedicated full-time terminologist. This would 
avoid funding applications for multiple, parallel terminology projects and would ensure that all 
terminology work in the sector would be coordinated and would follow the same methodology and 
dissemination strategies. Conscious of the need to ensure consistency between the terminology taught 
in schools and that taught at university, the CWMHE chose to base its project in the LTU at Bangor 
University, where work on the Termiadur series was ongoing. This would not only foster collaboration 
between both education terminology projects but also enable them to share resources. It also fitted 
neatly with objective 5.5 of the 2010 Welsh-medium Education Strategy, which noted that the HE 
sector should contribute to joint working arrangements for the development and standardization of 
terminology in education [17] (p. 43). Initially funded for a period of two years, with the inception 
of the new Coleg Cymraeg Cenedlaethol, the project secured recurrent funding and is ongoing at the 
time of writing. The first steps taken were to prepare the three print HE terminological dictionaries 
from 2004 and 2008 for online publication and add these and the fourth online HE terminological 
dictionary to a new HE terminology portal, where users could search university terminology all in one 
place. These early HE terminological dictionaries included definitions, and it was determined that 
definitions would continue to be included as an integral part of future terminology work at HE level. 
More subject-specific terminological dictionaries were added in fields including international politics, 
sports science, chemistry, mathematics and physics. In 2015, it was decided that these would all be 
rebranded as one new terminological dictionary, entitled Geiriadur Termau'r Coleg Cymraeg Cenedlaethol, 
the "Coleg Cymraeg Cenedlaethol Terminological Dictionary" (referred to henceforth as the Coleg Cymraeg 
dictionary in this article) [23]. At this point, subject fields began to be displayed as sub-fields of the 
main terminological dictionary (work to tag entries by subject field is underway in the Y Termiadur 
Addysg project). Currently, the Coleg Cymraeg dictionary features 14 subjects agreed upon by the project 
board in consultation with subject panels and with reference to the priorities outlined in the Coleg 
Cymraeg Cenedlaethol’ s Academic Plan. In comparison, Y Termiadur Addysg covers terminology for over 
70 courses in approximately 35 subjects (the exact number of subjects is open to interpretation as some 
courses such as Humanities may include an element of a number of different subjects and similar 
courses, whilst film studies could be considered an aspect of media studies even though both are 
offered as separate A Level courses.) As a result, Y Termiadur Addysg prioritizes breadth of coverage, 
while the Coleg Cymraeg dictionary prioritizes depth of knowledge. The Coleg Cymraeg dictionary with 
definitions is available online and, as of 2015, within apps available for Amazon, Android and Apple 
iOS. Although both the Y Termiadur Addysg project and the Coleg Cymraeg dictionary project standardize 
terms for the education sector and each employ a single terminologist with additional editorial and 
technical support, they differ in some respects with regard to their priorities and target users. Whilst the 
Coleg Cymraeg dictionary is aimed at a fairly homogenous audience of lecturers, researchers, in-house 
university translators and undergraduate and post-graduate students, the audience of Y Termiadur 
Addysg is more diverse. In addition to professionals such as teaching staff and the translators of 
course materials and examination papers, Y Termiadur Addysg is intended to be used by students of 
all ages including primary school pupils, and the format and content of the terminological dictionary 
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must therefore be accessible to those sbidying at all academic levels. Another issue in a bilingual 
country such as Wales is that many of the parents of students attending Welsh-medium schools are 
not themselves Welsh-speakers [24], and there may be varying degrees of Welsh-language proficiency 
amongst the students themselves. This means that Y Termiadur Addysg must present its information 
to the user in a simpler manner than that which is required of the Coleg Cymraeg dictionary and it 
must provide additional user aids, such as text-to-speech, for those to whom the Welsh language is 
less familiar. 

Unlike most of Y Termiadur Addysg's intended audience, intended users of the Coleg Cymraeg 
dictionary have chosen to specialize in a specific field and consequently a higher level of expertise can 
be assumed when designing terminological dictionary entries intended for their use. HE students are 
required to possess a more detailed understanding of the concepts relating to their studies, and it is 
important that definitions of the concepts taught in Welsh-medium HE are available in Welsh. 

2. Experimental Section: Meeting the Needs 

The theoretical basis for Welsh terminology work in education is the ISO 704 standard, which 
requires that a term be linguistically correct, accurate, concise and monosemous and that it should, 
ideally, give rise to derivatives [25]. This means that not only should a term comply with linguistic 
norms, which in the case of Welsh means following the spelling guidelines found in the latest 
edition of Orgraff yr Iaith Gymraeg (1987), but that it should also reflect, as far as is possible, the 
characteristics of the concept, and that the term should refer to one concept only. Other ISO standards 
have informed the work in Wales over the years, and these have formed the basis of a number of 
guidelines produced by the LTU for various government bodies, the most recent of which appeared 
in 2007 [25]. However, perhaps the second most important ISO document with regard to Welsh term 
formation and selection is ISO 15188 on Project management guidelines for terminology standardization [26], 
which emphasizes the importance of a consensus-based approach in the validation and adoption of 
terms within subject-specialist communities [25]. As a result, both projects seek the opinion of subject 
specialists for candidate terms where there is concern over the appropriateness of one word over 
another as a label for a concept. In the Coleg Cymraeg project, many of the terms to be standardized are 
submitted to the project by the subject specialists themselves. These concepts must then be defined 
and the specialists play a major role in the process, with feedback being given either by a small number 
of individuals or, less frequently, a larger selection of lecturers from the relevant subject panel. This is 
possible as the number of terms dealt with in this project is comparatively smaller, with the result that 
a large percentage of terms is discussed with subject specialists. These specialists include individuals 
and organizations external to the HE sector, as was the case with terms for the creative industries where 
Welsh media professionals and broadcasting companies were involved in standardization. In the 
Y Termiadur Addysg project the percentage of terms brought to the attention of subject specialists is 
comparatively smaller, as the terms used in many subject fields have been well established through 
decades of teaching, and are lifted primarily from published works that have gone through an editorial 
process by publishers such as the WJEC. Y Termiadur Addysg however also recieves ad hoc enquiries 
regarding terminology from education stakeholders, including subject specialists, which are then 
incorporated into the terminological dictionary. 

2.1. Term Collection 

One of the main tasks facing the terminologist during the creation of a terminological dictionary 
is that of collecting the terms relevant to a particular subject domain. This includes source language 
terms (which in the case of Wales means English terms), and candidate target language (Welsh) terms 
which require standardization. These can be gathered from a variety of different resources of varying 
provenances, which generally include existing reference works, specific documents and support 
materials [27] (p. 116) which belong to the domain in question. Within the context of Y Termiadur 
Addysg these correspond to historic bilingual terminological dictionaries, English course books and 
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their Welsh translations, annual bilingual examination papers and bilingual course specifications 
(these specifications describe to educators what the students are required to learn on the course and 
are therefore a valuable source of terminology.) Past and yet-to-be-published examination papers and 
course specifications are key resources for the project as it is vital that terms used in an examination 
are not encountered by the student for the very first time in the examination itself. 

Given that such materials have been published for Welsh-medium schools over a period of several 
decades and continue to be published regularly today, there is a wealth of parallel texts at the disposal 
of the Y Termiadur Addysg project, from which the terminologist can select English terms and candidate 
Welsh terms. Subjects covered by these resources are mostly from traditional fields (mathematics, 
religious education, history), many of which have long been taught through the medium of Welsh. 
Vocational fields, which were only recently added to the remit of the project, have fewer such resources. 

In contrast, in HE, it is only relatively recently that study materials and research have begun to 
be published in Welsh in some fields (such as solar physics, genetics and sports science). There is the 
added complication that the more scientific the area of study, the more likely it is that researchers 
will publish in English, both in order to appeal to a larger readership and in order to publish in 
prestigious international journals, an important consideration when preparing for the Research 
Excellence Framework (REF) which assesses the quality of research carried out in HE institutions in 
the UK. 

There are very few bilingual resources available to the Coleg Cymraeg dictionary project for term 
collection work. The most commonly encountered resource is term lists supplied by lecturers, which 
include English source terms and the lecturer's own suggestion for candidate Welsh terms. Another, 
much rarer possibility, is that an academic might author and submit for editing and publication 
an entire terminological dictionary for a particular subject field. This has occurred once in the history 
of the Coleg Cymraeg dictionary project, with submission in 2015 of an earth sciences terminological 
dictionary by geologist Dyfed Elis-Gruffydd [28]. 

Candidate Welsh terms may also be sourced from yet-to-be-published HE course books and 
published or yet-to-be-published articles from Welsh scholarly journals. A difficulty, however, is that 
the source English term is not provided and, given that the Welsh candidate term is often a neologism 
coined by the author, the terminologist must work backwards from the Welsh and decipher which 
concept is in question in order to arrive at a source term. The more closely the candidate term reflects 
the characteristics of the concept, the easier it is to achieve this. Articles used for such purposes are 
primarily published in Gwerddon, a peer-reviewed interdisciplinary e-journal published twice annually 
and also financed by the Coleg Cymraeg Cenedlaethol [29]. Although the Coleg Cymraeg dictionary 
project has been involved in the creation of relatively few course books, in 2015, it was involved in 
the standardization of terms for a book on the foundations of public law. This is the first in a series 
of law course books, again funded by the Coleg Cymraeg Cenedlaethol, which are set to be published 
in the coming years. Welsh candidate terms were collected from the volume and standardized in 
collaboration with its author, the project lead, the Welsh Government's Chief Jurilinguist and its 
First Legislative Counsel from 2007 to 2010. The involvement of others involved in terminology 
standardization in government helps ensure that terms used in the education sector are consistent 
with those recommended by BydTermCymru, the government's terminology website for translators [30]. 
This heralds a new working method for the Coleg Cymraeg dictionary project where the terminologist is 
involved in the creation of new course material from their initial stages until their publication, offering 
terminological aid throughout the process. 

As the HE terminology project uses materials funded by the same body as that which funds the 
terminology work, namely the Coleg Cymraeg Cenedlaethol, it avoids a problem which has affected 
Y Termiadur Addysg, and that is gaining access to the academic texts in which candidate Welsh terms 
appear. Obtaining printed copies of all such materials produced for schools is a costly venture 
and, having bought these, many are subject to copyright, leading to difficulties in using the most 
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efficient means of candidate term extraction. Obtaining digital copies from producers is, for the time 
being, unlikely. 

The Coleg Cymmeg dictionary project sources English terms not only from the materials previously 
mentioned but also, where copyright permits, from HE terminological dictionaries produced for 
a different pair of languages (as long as English is one of these and acts as a pivot language for the 
term list). This has occurred once in the Coleg Cymraeg dictionary project, when biologist Adam Oliver 
Brown gave permission to adapt his French-Language Glossary of Biological Terminology for the 
Welsh-language. This was produced for students of Ottawa University, and includes French definitions 
and English equivalent terms [31]. 

In cases where an English term list has been collected yet no equivalent candidate Welsh terms 
have surfaced in any of the materials mentioned above, then the terminologists must find a solution. 
New Welsh technical terms may be coined, adapted from Welsh general-language words or borrowed 
from another language entirely. Finding a new Welsh term using one of these methods is rarely 
problematic. Welsh is a well-developed language that has been used as a medium of literature for 
fifteen centuries [6] (p. x). It has long-established conventions for combining prefixes, suffixes and 
word elements, and a tradition spanning nearly two millennia of borrowing from Latin, a language 
from which many technical terms draw inspiration or are derived. Conventions for the transliteration 
of chemical names, for instance, have long been established, so that "ethylenediaminetetraacetic acid" 
would unambiguously be rendered as "asid ethylendeuamintetraasetig". Welsh terminologists may 
also look to other Celtic languages for inspiration. If an equivalent term for a given concept already 
exists for example in Irish [32], this may provide a useful pattern to follow, as these languages share 
common roots and structures with Welsh. 

More problematic, perhaps, than creating new terms, is achieving consensus regarding the 
appropriate term candidate where a number of potential equivalents already exist, especially if several 
of them are in current use. Where a term candidate has gained currency and is linguistically and 
conceptually acceptable it is considered best practice (in keeping with the international standards) for 
the term candidate in question to be selected as the preferred term. Whilst terms in English generally 
gain currency and become well-established through their continued use within a certain discourse, 
the discourse in less-resourced languages may not gain enough participations for such a process to 
reach a conclusion organically. For this reason, less-resourced languages can find themselves required 
to participate in a greater degree of prescription, as opposed to description, than is the case for other 
more-resourced languages. 

Problems include the use of competing forms for a single concept, a single general-language 
form being used for similar but distinct concepts within the same subject domain, or subject 
specialists deeming the term candidate(s) inappropriate for conveying the meaning of the concept. 
For example, the declaration in a court of law of a person's guilt, which in English is represented by 
the term "conviction", appears in Welsh alternatively as both euogfarnu and collfarnu. A familiar, 
general-language word in use for multiple technical concepts can occur in Welsh due to the 
prioritization of familiarity over technical accuracy by translators. One such example has been the 
use of ffrivythloni artiffisial (equivalent to "artificial fertilization") for "artificial insemination" as strict 
equivalents for "insemination" such as mewnsemenu are often seen as alien and unfamiliar. However, in 
this case a "fertilization" equivalent is inappropriate as the process of insemination does not guarantee 
fertilization. A candidate term deemed inappropriate by subject specialists was gorchudd id for "ice 
sheet" in geology, as the word gorchudd refers to something which "covers" something else, and an ice 
sheet (properly lien id) does not necessarily cover all land features as mountains may protrude above 
the ice. A lesser problem is that candidate terms may also be found not to comply to linguistic norms, 
although usually this is simply caused by orthographic matters such as the incorrect use of accented 
characters or hyphens and is therefore easily rectified. 
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The sources used for term collection purposes in Welsh terminology work in the education sector 
differ according to the project in question. In the same way, the methodology of term collection also 
differs, with manual or semi-automated extraction procedures being used. 

2.2. Term Extraction 

Welsh term extraction, as previously mentioned, should in this context be considered to be term 
candidate extraction, as no judgement is made during the extraction stage as to the termhood of 
the recorded word form. The extraction method employed within the Y Termiadnr Addysg project 
depends on a number of considerations including the availability of digital versions of resources 
as well as their copyright status. Where digital copies are available and the copyright situation is 
favourable, semi-automated term extraction is possible using natural language processing (NLP) 
techniques. With Welsh texts, the first step is identifying unrecognised word forms that are not 
present in the LTU's lexicon of Welsh word forms. The texts are converted into a categorised NLTK 
corpus where information such as the subject domain and academic level of the text can be retained. 
Named entities such as personal names, place names and product names are filtered out at this stage, 
along with common misspellings, English words that are not also Welsh words, boiler plate text 
(for example "Student Name:"), and so forth. A simple software interface is then used to assist the 
terminologist in manually classifying the remaining unrecognised word forms into categories such as 
unrecognised Welsh word, unrecognised English word, word fragment, unrecognised place name etc. Word 
forms are classified in order of their frequency within the corpus, and can then be added to the general 
lexicon and assigned a part of speech, plural form or conjugation pattern, and so on. In this manner, 
unrecognised forms in the corpus can be processed relatively quickly into term candidates. Once these 
unrecognised forms have been added to the lexicon, the corpus can be lemmatized using the LTU's 
lemmatizer [33]. Lemmatization is the process of converting inflected forms such as mice and swam 
in English to their canonical forms of mouse and swim. This process is made more complicated in 
Welsh due to the fact that a word's beginning can inflect as well as its ending, a phenomenon known 
as initial consonant mutation. Following lemmatization, statistical techniques using bigrams and 
trigrams and tf.idf can be used to identify likely term candidates, and, within a categorised corpus, 
help determine to which subject domain a term belongs. English term extraction follows a similar 
approach, with the benefit that English NLP tools are more widely available. Parallel bilingual subject 
domain corpora have also been created and are used as research tools by the terminologists. Although 
the Coleg Cymraeg dictionary has not made much use of NLP techniques as of yet, permission has been 
obtained to convert the archived issues of Gwerddon, the Coleg Cymraeg Cenedlaethol’s Welsh-medium 
journal, into a corpus for use in concordance searches and term candidate recognition. Another useful 
resource which has implications for the Coleg Cymraeg dictionary project is the creation and expansion 
of a corpus of searchable academic texts through the DECHE Digitising, E-publishing and Language 
Corpus project [34]. The LTU provides a number of additional, publically available online monolingual 
and bilingual searchable corpora as research tools, although these are limited to texts whose copyright 
terms allow for this usage, or whose copyright has expired. 

Unfortunately, the use of natural language processing techniques is not always feasible. 
With printed publications, terms must often be extracted manually as copyright or general availability 
issues mean that digital copies are often not available. Where copyright issues exist, the creation of 
digital copies through scanning and using optical character recognition (OCR) techniques would not 
be legal. Where this is not an issue, modern OCR can be used quite effectively with Welsh as with 
English, allowing for successful digitization. However the complicated layouts of many course books, 
with multiple columns, images and boxouts can make post-editing OCR too time-consuming to be 
worthwhile. One of the problems encountered when extracting new terms manually from a text is 
that the same term may arise many times. Without using technology it is difficult to track whether 
a term has already been recorded or not, leading to repeated manual recording of the same term. 
To accelerate the manual term extraction process in the Y Termiadur Addysg project an autocomplete 
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feature was added to the term extraction interface. This was linked to the list of term candidates 
previously recorded and allows the operator to quickly establish with only a couple of keystrokes 
whether a term has already been collected, and record the specific instance of the term, should term 
frequencies need to be recorded. 

2.3. Ensuring Consistency 

Consistency in terminology between levels of education, subjects and educational institutions 
is a high priority, as it is vital to ensure that children moving from one stage of education into the 
next are not confronted with new terms which differ greatly from terms used for related concepts 
already discussed in earlier years. Although in Wales there are only two terminological dictionary 
projects which are specific to the education sector, many other terminological dictionaries have been 
commissioned over the years by other bodies in other sectors, and these contain terminology which is 
relevant to students and others involved in education. In 2010, the LTU launched its Welsh National 
Terminology Portal, which features 18 terminological dictionaries developed by the unit itself and its 
approved partners [35]. Many of these had previously been available only in hard copy. The portal 
allowed users, for the first time, to search all of these dictionaries simultaneously using a single search 
box. Although a boon to terminologists, translators and many others, an unexpected side-effect of 
this new, powerful search option was that it served in some instances to highlight inconsistencies 
between terminological dictionaries that were meant to be consistent. Such inconsistencies included 
two preferred terms used for a single concept across two terminological dictionaries, or a different 
part of speech. As a result, a program was developed within the LTU to identify potential examples of 
these inconsistencies and bring them to the attention of the terminologists. 

2.4. Definitions 

Term collection and standardization is common to both the terminology projects currently 
underway in the Welsh education sector, however a further element of terminology work is unique to 
the HE terminology work, namely definition writing. The decision to include definitions depends on 
project priorities and resources. The priority for the Termiadur series has, since the beginning, been 
to standardize a great many terms within the timeframe of the project, in order to fill a considerable 
gap in the terminology required for the education sector. However, the Coleg Cymmeg dictionary 
was developed later when the groundwork for terminology in education had already been carried 
out. It was therefore possible to concentrate on a smaller number of concepts often within a more 
specialized field, and provide in-depth information about them through the inclusion of definitions. 

In the Coleg Cymmeg dictionary, initial drafts of definitions are prepared either by the terminologist, 
using reference books and articles, websites and other terminological dictionaries as a guide, or by 
the subject specialist, using his or her own knowledge of the concept. The draft is then discussed by 
both the terminologist and subject specialists and fine-tuned, to ensure that it is accurate and clear, 
and that it complies with ISO 704. The definition is, in most cases, translated so that it is available in 
both Welsh and English. This process tends to highlight any unclear or ambiguous phrasing which is 
then corrected, therefore, providing the definition bilingually often increases the clarity of the concept. 
Compliance with ISO 704 means that the following problematic structures are avoided: 

• incomplete definitions which are too broad in scope; 

• negative definitions, which explain what the concept is not, without explaining what it is; 

• circular definitions which repeat the term within the definition and do not add to the reader's 

understanding of the concept. 

Definitions include the essential characteristics of the concept and provide sufficient information 
so that the student may be able to identify a concept and differentiate between it and other similar 
concepts, as well as understand the relationship between related concepts. Definitions often include 
rich text features made possible by recent technical developments to the in-house system, Maes T. 
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These include clickable cross-references to related concepts, the addition of graphical elements such 
as diagrams and photographs and the inclusion of mathematical formulae. More than one method 
of enabling these was explored, however the one which best suited the needs of the Coleg Cymraeg 
dictionary was achieved by extensive use of Markdown as well as the use of LaTeX and a small subset 
of HTML elements, such as the "img" tag. 

3. Results and Discussion 

3.1. Dissemination of Terms 

Recent years have seen great changes in the technologies used to construct, standardize 
and distribute terminological resources, especially in the field of web-based services and mobile 
connectivity, increasing the technological demands placed on those developing the resources. In order 
to deal with these demands, the Welsh terminology projects for the education sector share resources 
and technical personnel. A software developer is part-funded by both projects to develop and improve 
the in-house terminology standardization and dissemination platform, Maes T, the Welsh terminology 
app, and the web-based dictionaries. 

The Maes T system is an online terminology development interface for creating, editing and 
publishing dictionary entries [36]. One of the main drivers for its creation was the need to enable teams 
of geographically dispersed subject specialists to contribute to standardization, in order to adhere 
to the consensus-based, concept-led principles that underpin modern terminology standardization 
work. Using this shared platform ensures consistency in standardization methodology across both the 
Y Termiadur Addysg and the Coleg Cymraeg projects. It is used for storing all the required term data, 
including collected source terms and candidate target terms, definitions and linguistic information. 
An invaluable feature of its design is that it allows discussions about term candidates and definitions 
to take place between subject specialists and terminologists, archiving all such information for future 
reference. Members of the public do not have access rights to this system; only published terms 
and certain data fields (excluding discussions about the suitability of candidate terms) are visible to 
end users of the dictionaries. Maes T serves as a platform from which to disseminate dictionaries to 
websites belonging to commissioners of terminology work, to the Welsh National Terminology portal, 
and to apps. 

Y Termiadur Ysgol and the Coleg Cymraeg dictionary are disseminated exclusively in digital format, 
for a number of reasons. Digital editions eliminate the cost of publishing print dictionaries and allow 
updates to the dictionaries' content. New entries may be added and amendments made to any terms 
which may have changed over time due to the adoption and use of a different term by the public. 
Such amendments are tagged in the database and added to a list of amendments on the website. 
This situation arises in the case of terms from fields such as IT and sports, where terms trickle down to 
the general public, unlike terms in domains which remain primarily the preserve of subject specialists. 
Thirdly, publishing online and in apps allows users instant access to terms on devices they carry 
everywhere on their person, making them much more convenient and portable. The move towards 
publishing on apps is a more recent development and was driven by two factors. Firstly, students 
and lecturers were requesting that content be made available in this medium and secondly it allowed 
for content to be stored on a device and accessed without an internet connection, unlike websites. 
The Maes T platform, developed in house by the LTU, is used for publication of dictionary entries. 

Y Termiadur Addysg is disseminated to its own dedicated dichonary website [16], while the 
Coleg Cymraeg dictionary is disseminated to the Coleg Cymraeg Cenedlaethol's institutional website [23]. 
Each website hosts its own separate terminological dictionary in a fully searchable form powered by 
the LTU's terminology distribution platform (for more detail see [35]). The search facility is enhanced 
with lemmatization so that users searching with an inflected search word can find the appropriate 
dictionary headword form, an especially useful feature for learners and the non-Welsh speaking 
parents of Welsh-medium students who may not recognise the relationship between an inflected form 
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and the canonical form, especially when (due to initial consonant mutation) those forms may not begin 
with the same letter. In addition to searching for the required terms, users can also browse the terms in 
alphabetical order. Whilst both websites share much of the underlying architecture and layout, due 
to the differing needs of their users and the nature of the content found in each dictionary, entries 
differ in some regards. Y Termiadur Addysg's entries for example feature audio versions of the terms 
that can be listened to by clicking on a button within the dictionary entry, a feature not shared by the 
Coleg Cymraeg dictionary. This facility was requested by the Welsh Government specifically to help 
learners and non-Welsh speakers as it can be difficult for those not accustomed to Welsh orthography 
to associate the written form of certain letters with their sound when produced vocally, for example 
the Welsh digraph "dd" corresponds to the voiced dental fricative "th" found in English words such as 
"the". The audio entries were produced using text-to-speech software developed by Ivona for the Royal 
National Institute of Blind People (RNIB) as part of a Welsh Government grant. Whilst the synthesized 
speech is not always perfect, it represents the most naturally sounding Welsh synthetic speech engine 
currently available, and the software can be obtained for free for non-commercial purposes from the 
RNIB. Whilst Y Termiadur Addysg's entries feature audio versions of the headwords, they do not feature 
definitions, a feature seen in the majority of the Coleg Cymraeg dictionary's entries. This reflects the 
difference in priorities between both projects previously mentioned. Without definitions, Y Termiadur 
Addysg uses disambiguation text to differentiate between multiple concepts that share the same word 
form. For example: 

seal (=piece of wax etc.) sel 

seal (=sea mammal) morlo 

Whilst the Coleg Cymraeg dictionary also uses disambiguation texts to some degree, definitions 
and the tagging of distinct subject domains make their use less necessary. 

In addition to being available within both these websites, the terminological dictionaries are also 
aggregated within the Welsh National Terminology Portal. For many users, this makes the portal 
rather than the project website the first port of call for terminological enquiries as it obviates the need 
for identical searches on multiple websites and displays the resulting dictionaries' entries together on 
the same screen for ease of comparison. 

In 2012, Y Termiadur Addysg was made available along with a general-language dictionary as part 
of an app entitled Ap Geiriaduron ("Dictionaries App") for Google's Android operating system, Amazon 
OS and Apple iOS. This app was created by a graduate developer who was funded to work at the FTU 
by a Graduate Opportunities Wales grant. It proved so popular that the dedicated Y Termiadur Addysg 
app was later discontinued as users preferred to install a single dictionaries app, especially on devices 
with limited available memory. In 2015 the Coleg Cymraeg's dictionary was added to the Ap Geiriaduron 
with the update also introducing the features unique to that project, namely definitions with images, 
cross references and support for complex mathematical formulae using Mathjax. In addition to offline 
searching of the installed dictionaries, when connected to WiFi users can choose to search the Welsh 
National Terminology Portal from within the app. 

An issue which had to be addressed when moving from online dissemination to apps was the 
low storage space on devices such as mobile phones. This was not a problem initially as no definitions 
were included in the first version of the Ap Geiriaduron. With the move to include the Coleg Cymraeg 
dictionary, however, the amount of space required for downloading definitions had to be considered. 
This was resolved by optimizing the database schema and keeping the size of definition data to 
a minimum by post-processing Markdown and FaTeX on the device rather than on the server (since 
Markdown and FaTeX are compact and efficient formats). Despite such challenges, the development of 
an app interface for the terminological dictionaries has proved opportune as the use of mobile devices 
to access digital media has increased significantly in recent years, a tendency which is visible in the 
usage statistics. 
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3.2. Evaluating the Work 

The main concern in terminology standardization work carried out for Welsh-medium education 
is that the required terms be rigorously standardized and disseminated to stakeholders as quickly as 
possible. Evaluating the impact of this work on the stakeholders and chronicling developments in 
Welsh terminology for other terminologists outside Wales is a secondary consideration. 

Having said that, some qualitative and quantitative evaluation has been carried out. A case 
study entitled "Welsh Lexicography and Terminology" was submitted to the REF 2014 Impact 
Assessment exercise as part of the Bangor University School of Linguistics' submission. It included 
Y Termiadur Addysg as one of its two principal outputs. Together with the LTU's work on digitizing 
a general-language dictionary entitled The Welsh Academy English-Welsh Dictionary: Geiriadur yr Academi 
(1995), it contributed directly to Bangor University gaining second place overall in the UK in terms of 
social, economic and cultural impact in the Modern Languages and Linguistics Unit of Assessment [37], 

With regard to qualitative evaluation, some statistics on the usage of terminological dictionaries 
for the education sector have been collected, but it is not always possible to draw direct comparisons 
between them all. The following table presents an overview of the number of searches recorded on 
the Y Termiadur Addysg website, the Coleg Cymraeg dictionary website and on the Welsh National 
Terminology Portal, which includes both these dictionaries (see Table 1). The figures date from 
September 18th, 2015 and are taken from Prys, Prys and Jones [38] (p. 357). 


Table 1 . Website search figures. 


Website 

Launch Date 

Total Searches 

Average Searches/Month 

Y Termiadur Addysg 

August 2011 

568,136 

11,595 

Coleg Cymraeg Dictionary 

March 2010 

18,824 

285 

Welsh National Terminology Portal 

March 2010 

836,414 

12,673 


These figures are very encouraging for both the Y Termiadur Addysg and the National Portal 
websites. There are several possibilities which would account for the disparity between the search 
figures for these two sites and for the Coleg Cymraeg dictionary website. The number of school-aged 
pupils who study through the medium of Welsh far exceed that of Welsh-medium university students. 
The Termiadur series is, in addition, a longer-running and therefore more well-known resource. 
University students who are already familiar with Y Termiadur Addysg may continue using it, or 
use the National Terminology Portal to access it and the Coleg Cymraeg dictionary simultaneously. 
Anecdotal evidence also suggests that users such as translators prefer to use the National Portal for 
speed and convenience, as it eliminates the need to search multiple terminological dictionary sites. 
A final consideration is that the Y Termiadur Addysg website and the National Portal are dedicated 
terminological dictionary sites. The Coleg Cymraeg website features a host of valuable resources 
for university students, so much so that students may not always be aware of the exact range of 
resources available. Perhaps increased marketing of the Coleg Cymraeg dictionary website would 
increase traffic to it; in fact, in a survey of the Coleg Cymraeg terminology service and resources 
carried out in 2015, the majority of the 111 respondents (primarily lecturers and students) believed the 
terminological dictionary website to be insufficiently marketed [39]. With the launch of the dictionary 
in the Ap Geiriaduron, recent marketing efforts have concentrated on this platform. Such efforts 
include advertising the app on Twitter and Facebook and, in Welsh universities, advertising on large 
information screens such as in libraries and student service departments, as well as on screensavers 
used in student computer rooms. 

It is not possible to make direct comparisons between the statistics of the websites above and 
those available for the Ap Geiriaduron as information about the searches undertaken by users on their 
devices is not sent back to the LTU's servers (due to security and privacy considerations). Since the 
launch of the app in late 2012, however, it has been downloaded over 50,000 times, the vast majority 
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of these downloads occurring in the UK. This is a significant figure, given the total number of Welsh 
speakers. Approximately 70% of the downloads were for iOS (the operating system found on iPads 
and iPhones), 28% for Android, and 2% for Amazon's version of Android, FireOS. 

These usage statistics, together with the REF results with regard to social impact, are clear 
indications that terminology for all levels of education is not only deemed vital by those who aim 
to develop Welsh-medium provision (i.e., commissioners of terminological dictionaries), but are 
considered a key resource by all those involved in education, be they teachers, lecturers, pupils, 
students, parents or translators. 

4. Conclusions 

Important steps have been taken in terminology standardization for Welsh-medium education 
since the early 1990s. These include the adoption of international standards and their implementation 
within the collaborative terminology development environment for Welsh, and the creation of 
a dissemination platform to deliver standardized Welsh-medium terminological dictionaries in 
a number of different websites and apps to cater for the varied needs of different clients and users. 
The consolidation of terminology standardization work at the LTU and the continued funding of both 
the Y Tertniadur Addysg project and the Coleg Cymraeg dictionary project have played a key role in these 
successes. However, the creation of standardized terminological dictionaries and the establishment of 
the underlying support infrastructure is but a first step. Fields of study change, and resources such as 
examination papers and updated course specifications are produced in regular cycles. New subjects 
such as music technology may be offered in the language for the first time. This creates a steady 
demand for new and updated terminology, reflecting the fact that the standardization of terminology 
is a continuous process rather than a task that can eventually be brought to a permanent conclusion. 

Ensuring consistency of terminology and efficiency in term candidate identification is a challenge 
when managing multiple terminological dictionaries that include thousands of entries. This is 
increasingly being addressed with the use of natural language processing techniques. However, 
the development of NLP is central to the remit of neither the Y Termiadur Addysg project nor the 
Coleg Cymraeg dictionary project, and the focus of both projects must therefore remain on the creation 
and standardization of terminological dictionary entries. Another issue is that of successfully engaging 
with all of the potential users, as the projects receive no dedicated marketing budget. Despite this, the 
usage statistics are encouraging, especially that of the app, and it is hoped that the projects will be able 
to build on the solid foundation that has been laid by continuously expanding the number of terms 
available, keeping abreast of the relevant technologies and improving the marketing of resources so 
that all potential users are aware of the terminological dictionaries available to them. 
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